Comparing Sentence-Level Features for Authorship Analysis in Portuguese
نویسندگان
چکیده
In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.
منابع مشابه
Deep Sentence-Level Authorship Attribution
We examine the problem of authorship attribution in collaborative documents. We seek to develop new deep learning models tailored to this task. We have curated a novel dataset by parsing Wikipedia’s edit history, which we use to demonstrate the feasiblity of deep models to multi-author attribution at the sentence-level. Though we attempt to formulate models which learn stylometric features base...
متن کاملSyntactic Stylometry: Using Sentence Structure for Authorship Attribution
Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those featur...
متن کاملCommonality of neural representations of sentences across languages: Predicting brain activation during Portuguese sentence comprehension using an English-based model of brain function
The aim of the study was to test the cross-language generative capability of a model that predicts neural activation patterns evoked by sentence reading, based on a semantic characterization of the sentence. In a previous study on English monolingual speakers (Wang et al., submitted), a computational model performed a mapping from a set of 42 concept-level semantic features (Neurally Plausible ...
متن کاملA Deep Context Grammatical Model For Authorship Attribution
We define a variable-order Markov model, representing a Probabilistic Context Free Grammar, built from the sentence-level, delexicalized parse of source texts generated by a standard lexicalized parser, which we apply to the authorship attribution task. First, we motivate this model in the context of previous research on syntactic features in the area, outlining some of the general strengths an...
متن کاملPatterns of local discourse coherence as a feature for authorship attribution
We define a model of discourse coherence based on Barzilay and Lapata’s entity grids as a stylometric feature for authorship attribution. Unlike standard lexical and character-level features, it operates at a discourse (cross-sentence) level. We test it against and in combination with standard features on nineteen booklength texts by nine nineteenth-century authors. We find that coherence alone...
متن کامل